Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deviantart] add /view URL support #3367

Merged
merged 1 commit into from
Dec 17, 2022
Merged

Conversation

the-blank-x
Copy link
Contributor

URLs I found when searching for deviantart.com/view on web.archive.org:

  • https://www.deviantart.com/view/<id>

    • http:// redirects you to https://
    • deviantart.com redirects you to www.deviantart.com
    • One trailing slash in the URL is ignored
    • If the deviation exists, it redirects you to https://www.deviantart.com/<username>/<art|journal>/<id>
    • If no deviation exists, it sends a 404
    • There are some URLs with an offset parameter, e.g. https://www.deviantart.com/view/14864502/?offset=80 (wayback machine), don't know what it means
  • https://www.deviantart.com/view.php?id=<id>
    https://www.deviantart.com/view-full.php?id=<id>

    • http:// redirects you to https://
    • deviantart.com redirects you to www.deviantart.com
    • Trailing slashes in the URL redirect you to the URL but with one trailing slash removed
    • Currently sends a 404, regardless if there's a deviation or not

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Dec 7, 2022

The /view URL comes in very handy when only the index of the deviation (or its base36 equivalent) is known.

class DeviantartViewExtractor(DeviantartExtractor):
"""Extractor for single deviations from a /view URL"""
subcategory = "view"
pattern = (r"(?:https?://)?(?:www\.)?deviantart\.com/()()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are ()() in-dev placeholders ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the first two groups are treated as an optional username, so you need to have two empty groups for URLs without a username, like DeviantartWatchPostsExtractor

@mikf
Copy link
Owner

mikf commented Dec 17, 2022

It would also be possible to slightly extend the existing DeviationExtractor to have the same result without an extra HEAD request and without much extra code.

Deviantart does not seem to care about username, type, or slug in deviation URLs, so we can just use default names if none are given.

diff --git a/gallery_dl/extractor/deviantart.py b/gallery_dl/extractor/deviantart.py
index df59be4a..597abccf 100644
--- a/gallery_dl/extractor/deviantart.py
+++ b/gallery_dl/extractor/deviantart.py
@@ -854,7 +854,9 @@ class DeviantartDeviationExtractor(DeviantartExtractor):
     """Extractor for single deviations"""
     subcategory = "deviation"
     archive_fmt = "g_{_username}_{index}.{extension}"
-    pattern = BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)"
+    pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)"
+               r"|(?:https?://)?(?:www\.)?deviantart\.com/"
+               r"(?:view/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)(\d+)")
     test = (
         (("https://www.deviantart.com/shimoda7/art/For-the-sake-10073852"), {
             "options": (("original", 0),),
@@ -919,11 +921,12 @@ class DeviantartDeviationExtractor(DeviantartExtractor):
     def __init__(self, match):
         DeviantartExtractor.__init__(self, match)
         self.type = match.group(3)
-        self.deviation_id = match.group(4)
+        self.deviation_id = match.group(4) or match.group(5)
 
     def deviations(self):
         url = "{}/{}/{}/{}".format(
-            self.root, self.user, self.type, self.deviation_id)
+            self.root, self.user or "u", self.type or "art", self.deviation_id)
+
         uuid = text.extract(self._limited_request(url).text,
                             '"deviationUuid\\":\\"', '\\')[0]
         if not uuid:

@the-blank-x
Copy link
Contributor Author

Was a bit on the fence about guessing the full URL since there's no guarantee that it'll continue to work, but I suppose it works

@mikf mikf merged commit 8d75855 into mikf:master Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants